Inference of multiple regression coefficients

3 minute read

Published:

This post covers Introduction to probability from Statistics for Engineers and Scientists by William Navidi.

Basic Ideas

  • The Statistics $s^2$, $R^2$, and $F$

    • The estimated error variance is given by

      $s^2 = \frac {\Sigma^n_{i=1}(y_i- \hat y_i)^2} {n − p − 1}= \frac {SSE}{n − p − 1}$

    • in the case of multiple regression, we are estimating $p + 1$ coefficients rather than just two. Thus the residuals tend to be smaller still, so we must divide

      $ \Sigma^n_{i=1}(y_i - \hat y_i)^2$

      by a still smaller denominator.

    • It turns out that the appropriate denominator is equal to the number of observations $(n)$ minus the number of parameters in the model $(p + 1)$.

    • The estimated variance $ \hat s^2{\beta_i} $ of each least-squares coefficient $ \hat \beta_i$ is computed by multiplying $s^2$ by a rather complicated function of the variables $x{ij}$

    • When assumptions 1 through 4 are satisfied, the quantity

      $ \frac {\hat \beta_i − \beta_i} {\hat s_{\beta_i}}$

      has a Student’s t distribution with $n − p − 1$ degrees of freedom. The number of degrees of freedom is equal to the denominator used to compute the estimated error variance $s^2$

    • This statistic is used to compute confidence intervals and to perform hypothesis tests on the values $\beta_i$, just as in simple linear regression.

    • In simple linear regression, the coefficient of determination, $r^2$, measures the goodness of fit of the linear model. The goodness-of-fit statistic in multiple regression is a quantity denoted $R^2$, which is also called the coefficient of determination

    • In simple linear regression, a test of the null hypothesis $ \beta_1 = 0$ is almost always made. If this hypothesis is not rejected, then the linear model may not be useful. Theanalogous null hypothesis in multiple regression is $H_0 : \beta_1 = \beta_2 = \ldots = \beta_p = 0$.

    • This is a very strong hypothesis. It says that none of the independent variables has any linear relationship with the dependent variable. In practice, the data usually provide sufficient evidence to reject this hypothesis.

    • The value of $R^2$ is calculated in the same way as is $r^2$ in simple linear regression \(R^2 = \frac {\Sigma^n_{i=1}(y_i − \overline y)^2 − \Sigma^n_{i=1}(y_i - \hat y_i)^2}{ \Sigma^n_{i=1}(y_i − \overline y)^2} = \frac{SST − SSE}{SST}= \frac{SSR}{SST}\)

    • The test statistic for this hypothesis is \(F= \frac {[\Sigma^n_{i=1}(y_i − \overline y)^2 −\Sigma^n_{i=1}(y_i - \hat y_i)^2]/p}{[\Sigma^n_{i=1}(y_i - \hat y_i)^2]∕(n − p − 1)} = \frac{[SST − SSE]∕p}{SSE∕(n − p − 1)}= \frac{SSR∕p}{SSE∕(n − p − 1)}\)

    • This is an $F$ statistic; its null distribution is $F_{p, n−p−1}$. Note that the denominator of the $F$ statistic is $s^2$. The subscripts $p$ and $n − p − 1$ are the degrees of freedom for the $F$ statistic.

    • Slightly different versions of the $F$ statistic can be used to test weaker null hypotheses.

    • In particular, given a model with independent variables $x_1,\ldots, x_p$, we sometimes want to test the null hypothesis that some of them (say $x_{k+1},\ldots, x_p$) are not linearly related to the dependent variable.

    • To do this, a version of the $F$ statistic can be constructed that will test the null hypothesis $H_0 : \beta_{k+1} = \ldots = \beta_p = 0$.